102 research outputs found

    A Fused Elastic Net Logistic Regression Model for Multi-Task Binary Classification

    Full text link
    Multi-task learning has shown to significantly enhance the performance of multiple related learning tasks in a variety of situations. We present the fused logistic regression, a sparse multi-task learning approach for binary classification. Specifically, we introduce sparsity inducing penalties over parameter differences of related logistic regression models to encode similarity across related tasks. The resulting joint learning task is cast into a form that lends itself to be efficiently optimized with a recursive variant of the alternating direction method of multipliers. We show results on synthetic data and describe the regime of settings where our multi-task approach achieves significant improvements over the single task learning approach and discuss the implications on applying the fused logistic regression in different real world settings.Comment: 17 page

    Markov Network Structure Learning via Ensemble-of-Forests Models

    Full text link
    Real world systems typically feature a variety of different dependency types and topologies that complicate model selection for probabilistic graphical models. We introduce the ensemble-of-forests model, a generalization of the ensemble-of-trees model. Our model enables structure learning of Markov random fields (MRF) with multiple connected components and arbitrary potentials. We present two approximate inference techniques for this model and demonstrate their performance on synthetic data. Our results suggest that the ensemble-of-forests approach can accurately recover sparse, possibly disconnected MRF topologies, even in presence of non-Gaussian dependencies and/or low sample size. We applied the ensemble-of-forests model to learn the structure of perturbed signaling networks of immune cells and found that these frequently exhibit non-Gaussian dependencies with disconnected MRF topologies. In summary, we expect that the ensemble-of-forests model will enable MRF structure learning in other high dimensional real world settings that are governed by non-trivial dependencies.Comment: 13 pages, 6 figure

    Proteome coverage prediction with infinite Markov models

    Get PDF
    Motivation: Liquid chromatography tandem mass spectrometry (LC-MS/MS) is the predominant method to comprehensively characterize complex protein mixtures such as samples from prefractionated or complete proteomes. In order to maximize proteome coverage for the studied sample, i.e. identify as many traceable proteins as possible, LC-MS/MS experiments are typically repeated extensively and the results combined. Proteome coverage prediction is the task of estimating the number of peptide discoveries of future LC-MS/MS experiments. Proteome coverage prediction is important to enhance the design of efficient proteomics studies. To date, there does not exist any method to reliably estimate the increase of proteome coverage at an early stage. Results: We propose an extended infinite Markov model DiriSim to extrapolate the progression of proteome coverage based on a small number of already performed LC-MS/MS experiments. The method explicitly accounts for the uncertainty of peptide identifications. We tested DiriSim on a set of 37 LC-MS/MS experiments of a complete proteome sample and demonstrated that DiriSim correctly predicts the coverage progression already from a small subset of experiments. The predicted progression enabled us to specify maximal coverage for the test sample. We demonstrated that quality requirements on the final proteome map impose an upper bound on the number of useful experiment repetitions and limit the achievable proteome coverage. Contact: [email protected]; [email protected]

    Mixture-of-Experts Variational Autoencoder for Clustering and Generating from Similarity-Based Representations on Single Cell Data

    Full text link
    Clustering high-dimensional data, such as images or biological measurements, is a long-standingproblem and has been studied extensively. Recently, Deep Clustering has gained popularity due toits flexibility in fitting the specific peculiarities of complex data. Here we introduce the Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model.The model can learn multi-modal distributions of high-dimensional data and use these to generaterealistic data with high efficacy and efficiency. MoE-Sim-VAE is based on a Variational Autoencoder(VAE), where the decoder consists of a Mixture-of-Experts (MoE) architecture. This specific architecture allows for various modes of the data to be automatically learned by means of the experts.Additionally, we encourage the lower dimensional latent representation of our model to follow aGaussian mixture distribution and to accurately represent the similarities between the data points. Weassess the performance of our model on the MNIST benchmark data set and challenging real-worldtasks of clustering mouse organs from single-cell RNA-sequencing measurements and defining cellsubpopulations from mass cytometry (CyTOF) measurements on hundreds of different datasets.MoE-Sim-VAE exhibits superior clustering performance on all these tasks in comparison to thebaselines as well as competitor methods.Comment: Submitted to PLOS Computational Biolog

    Absolute quantification of microbial proteomes at different states by directed mass spectrometry

    Get PDF
    The developed, directed mass spectrometry workflow allows to generate consistent and system-wide quantitative maps of microbial proteomes in a single analysis. Application to the human pathogen L. interrogans revealed mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense, and new insights about the regulation of absolute protein abundances within operons

    Phase-specific signatures of wound fibroblasts and matrix patterns define cancer-associated fibroblast subtypes

    Full text link
    Healing wounds and cancers present remarkable cellular and molecular parallels, but the specific roles of the healing phases are largely unknown. We developed a bioinformatics pipeline to identify genes and pathways that define distinct phases across the time-course of healing. Their comparison to cancer transcriptomes revealed that a resolution phase wound signature is associated with increased severity in skin cancer and enriches for extracellular matrix-related pathways. Comparisons of transcriptomes of early- and late-phase wound fibroblasts vs skin cancer-associated fibroblasts (CAFs) identified an "early wound" CAF subtype, which localizes to the inner tumor stroma and expresses collagen-related genes that are controlled by the RUNX2 transcription factor. A "late wound" CAF subtype localizes to the outer tumor stroma and expresses elastin-related genes. Matrix imaging of primary melanoma tissue microarrays validated these matrix signatures and identified collagen- vs elastin-rich niches within the tumor microenvironment, whose spatial organization predicts survival and recurrence. These results identify wound-regulated genes and matrix patterns with prognostic potential in skin cancer

    Novel Blood Vascular Endothelial Subtype-Specific Markers in Human Skin Unearthed by Single-Cell Transcriptomic Profiling

    Full text link
    Ample evidence pinpoints the phenotypic diversity of blood vessels (BVs) and site-specific functions of their lining endothelial cells (ECs). We harnessed single-cell RNA sequencing (scRNA-seq) to dissect the molecular heterogeneity of blood vascular endothelial cells (BECs) in healthy adult human skin and identified six different subpopulations, signifying arterioles, post-arterial capillaries, pre-venular capillaries, post-capillary venules, venules and collecting venules. Individual BEC subtypes exhibited distinctive transcriptomic landscapes associated with diverse biological pathways. These functionally distinct dermal BV segments were characterized by their unique compositions of conventional and novel markers (e.g., arteriole marker GJA5; arteriole capillary markers ASS1 and S100A4; pre-venular capillary markers SOX17 and PLAUR; venular markers EGR2 and LRG1), many of which have been implicated in vascular remodeling upon inflammatory responses. Immunofluorescence staining of human skin sections and whole-mount skin blocks confirmed the discrete expression of these markers along the blood vascular tree in situ, further corroborating BEC heterogeneity in human skin. Overall, our study molecularly refines individual BV compartments, whilst the identification of novel subtype-specific signatures provides more insights for future studies dissecting the responses of distinct vessel segments under pathological conditions

    The dynamics of root cap sloughing in Arabidopsis is regulated by peptide signalling

    Get PDF
    The root cap protects the stem cell niche of angiosperm roots from damage. In Arabidopsis, lateral root cap (LRC) cells covering the meristematic zone are regularly lost through programmed cell death, while the outermost layer of the root cap covering the tip is repeatedly sloughed. Efficient coordination with stem cells producing new layers is needed to maintain a constant size of the cap. We present a signalling pair, the peptide IDA-LIKE1 (IDL1) and its receptor HAESA-LIKE2 (HSL2), mediating such communication. Live imaging over several days characterized this process from initial fractures in LRC cell files to full separation of a layer. Enhanced expression of IDL1 in the separating root cap layers resulted in increased frequency of sloughing, balanced with generation of new layers in a HSL2-dependent manner. Transcriptome analyses linked IDL1-HSL2 signalling to the transcription factors BEARSKIN1/2 and genes associated with programmed cell death. Mutations in either IDL1 or HSL2 slowed down cell division, maturation and separation. Thus, IDL1-HSL2 signalling potentiates dynamic regulation of the homeostatic balance between stem cell division and sloughing activity

    Similarities and Differences of Blood N-Glycoproteins in Five Solid Carcinomas at Localized Clinical Stage Analyzed by SWATH-MS.

    Get PDF
    Cancer is mostly incurable when diagnosed at a metastatic stage, making its early detection via blood proteins of immense clinical interest. Proteomic changes in tumor tissue may lead to changes detectable in the protein composition of circulating blood plasma. Using a proteomic workflow combining N-glycosite enrichment and SWATH mass spectrometry, we generate a data resource of 284 blood samples derived from patients with different types of localized-stage carcinomas and from matched controls. We observe whether the changes in the patient's plasma are specific to a particular carcinoma or represent a generic signature of proteins modified uniformly in a common, systemic response to many cancers. A quantitative comparison of the resulting N-glycosite profiles discovers that proteins related to blood platelets are common to several cancers (e.g., THBS1), whereas others are highly cancer-type specific. Available proteomics data, including a SWATH library to study N-glycoproteins, will facilitate follow-up biomarker research into early cancer detection
    corecore